A look at the mortality due to low physical activity worldwide.
Fortunately, the data used to create this graph was already available on Our Word In Data and can be downloaded in a CSV format. This dataset was orignally published by the Institute of Health Metrics and Evaluation, an American research institute in the area of global health statistics. It consists of the number of deaths attributed to low physical activity per country per year. As the data is already available, I will not make use of the {owidR} package that enable to load OWID data in R. Let’s import the dataset.
The dataset is already tidy data, and is ready to be used.
In order to facilitate the modifications that we would like to do to the original graph later, we need to add a new column to the dataset with the population of the countries. To do so, we need a dataset of the total population of these countries from 1990 to 2019. One can be found one via another chart created by Our World In Data.
We need to filter the data to only keep the data from 1990 to 2019. Also, the population dataset contains the total population per continent. These rows are easy to spot, they have missing values as country codes.
Now that we have the population dataset containing the total population of countries between 1990 and 2019, we can create a new dataset that joins the data tibble and the population tibble and that contains a column with the total population of each country for each year.
data <-
inner_join(data, population,
by = c('Code'='Code', 'Year'='Year', 'Entity'='Entity')) |>
# Let's rename the columns for better readability
rename(c(Deaths = `Deaths that are from all causes attributed to low physical activity, in both sexes aged all ages`,
Population = `Population (historical estimates)`
)
)
We are almost done. As we can see on the original Our World in Data graph, the color attributed to each country is assigned on the basis of the range of number of deaths due to low activity in the country for a given year. Therefore, we need to discretize this variable in 9 different ranges, as the color scale used by OWID contains 9 ranges. After searching on the Web, I found that such an operation could be performed thanks to the {arules} package, using the discretize() function.
library(arules)
# We define a vector that contains the values of the edges of each range used in the color scale of the original graph.
breaks_map <- c(0, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 250000)
data <-
data |>
mutate(Deaths_disc = discretize(Deaths, method = 'fixed', breaks = breaks_map),
.after = Deaths)
To plot a map of the deaths due to low physical activity, we obviously need a map. To build this map, we need data as longitudes and latitudes of the countries worldwide. We will use the {sf} package to represent visally the geographic data (which are polygons) and the {rnaturalearth} to obtain this data.
First, we use the ne_countries() function from the rnaturalearth package. This function contains predownloaded vector maps for the world’s countries. We also want to make our world data a sf object, so that we can use the sf package in the future. The latter can be done by providing the argument returnclass = 'sf'.
library(sf)
library(rnaturalearth)
world <- ne_countries(
scale = 'large',
returnclass = 'sf')
We know need to join the world dataset and the data dataset to create the dataset that will be used to project the world map in ggplot2 and display the number of deaths due to low physical activity at the same time.
Finally, let’s clean up our data dataset a bit so that R may have less difficulty to run the code that we will write to create the graph. Indeed, there are a vast majority of columns that we are not going to be using to display the data of interest with ggplot. The only columns that we need are :
geometry, which is used by geom_sf() to plot the shapes of the countries.
Entity: the name of the country
continent
subregion
Year
Deaths
Deaths_disc
Population
On a visual standpoint, the OWID graph doesn’t show Antartica, so we can remove the data linked to it in our dataset. Lastly, there are many islands that might pop on our graph. The simplest solution is to clear our data dataset from some of these islands, which seem to be Polynesia.
The first thing we need to do is to create a projection of the world. As mentioned earlier, we will use the {sf} package, particularly the geom_sf() geom that enables to easily visualize sf objects. One of the advantage of this geom is that it is smart enough to understand the geometry type to draw according to the dataset we provide it.
At this point, we have generated an empty map of the world. One thing we can notice is that the projection (meaning the way a 3D object, the world, is represented in 2D) used by default by geom_sf() is not exactly the same as the one used by Our World In Data. It seems that the latter is a “Robinson” projection. We can use the coord_sf() function to change our projection to a Robinson one.
p <-
p +
coord_sf(crs = "+proj=robin")
p

Let’s add:
The title: “Deaths due to low physical activity, 2019”,
The subtitle: “Estimated annual number of deaths attributed to low physical activity”,
And the caption “Data source: IHME, Global Burden of Disease (2019) – Learn more about this data
After going further, it appeared that the caption was more easily handled if we divided it in two parts: a caption and a tag. Note that for the moment, the tag position is clearly not good, as well as the size of its text, which will both be addressed when setting the theme. For convenience until then, let’s comment the tag.
plot_title <- "Deaths due to low physical activity, 2019"
plot_subtitle <- "Estimated annual number of deaths attributed to low physical activity."
plot_caption <- "Data source: IHME, Global Burden of Disease (2019) -
Learn more about this data"
plot_tag <- "OurWorldInData.org/causes-of-death | CC BY"
p <-
p +
labs(title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption
#tag = plot_tag
)
p

One thing that we notice is that the OWID graph has a clean white background, without any displayed meridian. As the meridians are already colored in white, adding a theme_classic() should give us the look we are looking for. But we also want to get rid of the longitude axis. Then, the most appropriate theme to use is the theme_void().
p <-
p +
theme_void()
Also the caption of the OWID plot has to be moved on the left, the size of the font has to be modified so that the subtitle and caption are smaller, and the title bigger. Finally, the font of the title can be switched to something approximating the Times New Roman font. Unfortunately, we were unable to use the latter font (which remains the biggest mystery of my year 2023 as it’s the most basic font ever), so we needed to get an equivalent from the Google Font, the Merriweather font.
We can also de-comment the tag, and set its size and position in the theme(). After unsuccessfully trying several positions for the tag, such as ‘bottom’, and ‘bottomleft’, it appeared that the easiest solution was to manually set its position using plot.tag.position = 'bottom' (which gets the tag under the caption but in the center of the plot), and the margin argument of plot.tag = element_text().
library(showtext)
font_add_google('Merriweather', family = 'merriweather')
showtext_auto()
p <-
data |>
ggplot() +
geom_sf() +
coord_sf(crs = '+proj=robin') +
labs(title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption,
tag = plot_tag) + # adding the tag
theme_void() +
theme(plot.title = element_text(family = 'merriweather', size = 15),
plot.caption = element_text(hjust = 0),
plot.tag.position = 'bottom',
plot.tag = element_text(size = 10, # customizing tag's size and position
margin = margin(l = -340)))
p

So far, we have a chart that has some common appearance traits shared with the Our World in Data graph. We can now add the information we want to display, focusing on a single year: 2019.
The first step is to filter the data to obtain only the data from 2019. Then, we can add an aesthetic to the geom_sf() geom to display the number of Deaths from low physical activity around the world. Since we want to color the map according to this variable, we will use the fill aesthetic.
p <-
data |>
filter(Year == 2019) |> # filtering by year
ggplot() +
geom_sf(aes(fill = Deaths_disc)) +
coord_sf(crs = '+proj=robin') +
labs(title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption,
tag = plot_tag) +
theme_void() +
theme(plot.title = element_text(family = 'merriweather', size = 15),
plot.caption = element_text(hjust = 0),
plot.tag.position = 'bottom',
plot.tag = element_text(size = 10,
margin = margin(l = -340)))
p

One striking difference with the original plot is the legend. We need to move the legend to the bottom of the chart and change the color palette. We need to change the legend key into a color bar. This can be done by using a “Discretized colourbar guide”: the guide_colorsteps() guide. Within this guide, the “YlorRd” palette seems exactly the one that was used by Our Word In Data for their map.
p <-
data |>
filter(Year == 2019) |>
ggplot() +
geom_sf(aes(fill = Deaths_disc)) +
coord_sf(crs = '+proj=robin') +
labs(title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption,
tag = plot_tag) +
theme_void() +
theme(
plot.title = element_text(
family = 'merriweather',
size = 15),
plot.caption = element_text(
hjust = 0),
plot.tag.position = 'bottom',
plot.tag = element_text(
size = 10,
margin = margin(l = -340)),
legend.position = 'bottom', # moving the legend around
legend.title = element_blank()) +
scale_fill_brewer(palette = 'YlOrRd',
guide = guide_coloursteps( # Modifying the default legend guide
ticks = TRUE,
barwidth = 25,
barheight = 0.5,
frame.colour = 'black',
frame.linewidth = 0.01,
ticks.colour = 'black',
ticks.linewidth = 0.01)
)
p

Our plot starts looking a bit similar to the OWID one. Let’s move the legend’s labels on top of the legend, and display the limits of the legend. I thought that the first operation could be done by modifying the legend.text in the theme() function, but it seems not possible. Therefore, we can play on the margin argument of the element_text() function in the theme() to move the legend’s label around.
The limits are ruled by an argument of the previously used guide_colorsteps(): show.limits.
p <-
data |>
filter(Year == 2019) |>
ggplot() +
geom_sf(
aes(fill = Deaths_disc)) +
coord_sf(crs = '+proj=robin') +
labs(title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption,
tag = plot_tag) +
theme_void() +
theme(
plot.title = element_text(
family = 'merriweather',
size = 15),
plot.caption = element_text(
hjust = 0),
plot.tag.position = 'bottom',
plot.tag = element_text(
size = 10,
margin = margin(l = -340)),
legend.position = 'bottom',
legend.title = element_blank(),
legend.text = element_text(
margin = margin(t = -22) # adjusting the position of legend's labels
)
) +
scale_fill_brewer(palette = 'YlOrRd',
guide = guide_coloursteps(
show.limits = TRUE, # adding the scale limits to the legend
ticks = TRUE,
barwidth = 25,
barheight = 0.5,
frame.colour = 'black',
frame.linewidth = 0.01,
ticks.colour = 'black',
ticks.linewidth = 0.01)
)
p

Now the caption is slightly overriding the legend. We couldn’t fix it in the theme() by specifying a vjust argument to plot.caption, so rather decided to add a \n at the beginning of the caption to create an empty line.
plot_caption <- "\nData source: IHME, Global Burden of Disease (2019) - Learn more about this data" # adding a newline character at the beginning of the string to create space over the caption
p <-
data |>
filter(Year == 2019) |>
ggplot() +
geom_sf(
aes(fill = Deaths_disc)) +
coord_sf(crs = '+proj=robin') +
labs(title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption,
tag = plot_tag) +
theme_void() +
theme(
plot.title = element_text(
family = 'merriweather',
size = 15),
plot.caption = element_text(
hjust = 0),
plot.tag.position = 'bottom',
plot.tag = element_text(
size = 10,
margin = margin(l = -340)),
legend.position = 'bottom',
legend.title = element_blank(),
legend.text = element_text(
margin = margin(t = -22)
)
) +
scale_fill_brewer(palette = 'YlOrRd',
guide = guide_coloursteps(
show.limits = TRUE,
ticks = TRUE,
barwidth = 25,
barheight = 0.5,
frame.colour = 'black',
frame.linewidth = 0.01,
ticks.colour = 'black',
ticks.linewidth = 0.01)
)
p

Some adjustments also need to be done:
The borders of the polygons projected by geom_sf() seem to be a bit thicker than on our original OWID graph.
Many islands remain on the graph, giving it a messier look than the original one.
The title, subtitle, caption and tag need to be spaced a bit from the border of the graph.
For the first two issues, it seems that the solution could lie in the fact that the resolution/definition of the geometry variable used to project polygons on the graph and thus create a world map, is too high. I tried to use the st_simplify() function from the sf package but wasn’t able to then map my data. However, a much simpler way to do so worked well, by just lowering the scale of the ne_countries() function. This results in less details in the geometry variable, and thus, a less messy visual appearance. Of course, this option is not the optimal one in terms of precision, but it seems that Our World In Data opted for displaying the data for not all countries and island worldwide.
The 3rd issue can easily be solved with some little modifications of the theme() by passing a hjust argument to plot.title = element_text() and so on for the subtitle and caption.
The look is a lot cleaner, with softer edges. Let’s then try to :
Space the title, subtitle, caption, and tag a bit from the left of the plot. This can be achieved by playing on the hjust (margin for the tag) argument of each of these element_text() in the theme().
Get the title bold, the caption at the same size as the subtitle, the size of the tag a bit lower than the caption. To do so, we will add face and size arguments to element_text() in the theme().
Get the “Data source” string in the caption bold. For this, we will modify the plot_caption variable, load the {ggtext} library, and set plot.caption to ggtext::element_markdown.
Finally, the color of the text is not black in the original plot, but this color: #5b5b5b.
# Recap of the data processing with the modification on the ne_countries() function
data <- read_csv(file = "/08 - Skills Training/03 - UC3M/Data Visualization/Data Visualization - Final Project/Final_Project_Dataset.csv")
population <- read_csv(file = "/08 - Skills Training/03 - UC3M/Data Visualization/Data Visualization - Final Project/Population_per_country_per_year.csv")
population <-
population |>
filter(Year >= 1990 & Year <= 2019) |>
drop_na(Code)
data <-
inner_join(
data,
population,
by=c('Code'='Code', 'Year'='Year', 'Entity'='Entity')) |>
rename(c(Deaths = `Deaths that are from all causes attributed to low physical activity, in both sexes aged all ages`,
Population = `Population (historical estimates)`))
breaks_map <- c(0, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 250000)
data <-
data |>
mutate(Deaths_disc = discretize(
Deaths,
method = 'fixed',
breaks = breaks_map),
.after = Deaths)
world <-
ne_countries(
scale = 'small', # setting the value of the scale from "large" to "small"
returnclass = 'sf'
)
data <-
world |>
left_join(data,
by = c('adm0_a3' = 'Code'))
data <-
data |>
select(Entity,
continent,
subregion,
Year,
Deaths,
Deaths_disc,
Population,
geometry)
data <-
data |>
filter(continent != 'Antarctica') |>
filter(subregion != 'Polynesia')
# The plot, with these changes
library(ggtext)
plot_caption <- "\n**Data source**: IHME, Global Burden
of Disease (2019) - Learn more about this data"
p <-
data |>
filter(Year == 2019) |>
ggplot() +
geom_sf(
aes(fill = Deaths_disc)
) +
coord_sf(crs = '+proj=robin') +
labs(
title = plot_title,
subtitle = plot_subtitle,
caption = plot_caption,
tag = plot_tag
) +
theme_void() +
theme(
plot.title = element_text(
family = 'merriweather',
color = '#5b5b5b',
face = 'bold',
size = 15,
hjust = 0.1,
margin = margin(t = -10)
),
plot.subtitle = element_text(
color = '#5b5b5b',
hjust = 0.11,
size = 11),
plot.caption = ggtext::element_markdown(
color = '#5b5b5b',
hjust = 0.17,
size = 11,
margin = margin(t = 20)
),
plot.tag.position = 'bottom',
plot.tag = element_text(
color = '#5b5b5b',
size = 10,
vjust = -1,
margin = margin(l = -301)
),
legend.position = 'bottom',
legend.title = element_blank(),
legend.text = element_text(
margin = margin(t = -22)
)
) +
scale_fill_brewer(
palette = 'YlOrRd',
guide = guide_coloursteps(
show.limits = TRUE,
ticks = TRUE,
barwidth = 25,
barheight = 0.5,
frame.colour = 'black',
frame.linewidth = 0.01,
ticks.colour = 'black',
ticks.linewidth = 0.01)
)
p

Exactly as in the gapminder project, one of the point of the graph is to get it display an evolution over time. So far, we’ve been plotting the data only for year 2019, but it would be nice to create a more interactive plot, just as the original Our World In Data graph, that enables the user to filter the data by year and continent. Unfortunately for me, the integration of geom_sf() into the plotly library is not optimal, rendering a plot that is dynamic, but looks awful, with quite all of the modifications to the theme() not taken into account.